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Abstract 

We consider the change-point detection problem of deciding, based on noisy measurements, whether an unknown 
signal over a given graph is constant or is instead piecewise constant over two connected induced subgraphs of 
relatively low cut size. We analyze the corresponding generalized likelihood ratio (GLR) statistics and relate it 
to the problem of finding a sparsest cut in a graph. We develop a tractable relaxation of the GLR statistic based 
on the combinatorial Laplacian of the graph, which we call the spectral scan statistic, and analyze its properties. 
We show how its performance as a testing procedure depends directly on the spectrum of the graph, and use this 
result to explicitly derive its asymptotic properties on few significant graph topologies. Finally, we demonstrate both 
theoretically and by simulations that the spectral scan statistic can outperform naive testing procedures based on edge 
thresholding and \ 2 testing. 



1 Introduction 

In this article we are concerned with the basic but fundamental task of deciding whether a given graph, over which 
a noisy signal is observed, contains a cluster of anomalous or activated nodes comprising an induced connected sub- 
graph. Such a problem is highly relevant in a variety of scientific areas, such as community detection in social 
networks, surveillance, disease outbreak detection, biomedical imaging, sensor network detection, gene network anal- 
ysis, envi ronmental monitoring and malware detection. Recent theoretical contributions in the statistical literature 



(see, e.g.. lArias-Castro et al.1 ll2005[ 120081 1201 ill , lAddario-Berrv et all ll2010ln have detailed the inherent difficulty of 



such a testing problem in relatively simplified settings and under specific conditions on the graph topology. From a 
practical standpoint, the natural algorithm for detection of anomalous clusters of activity in graphs is the the gener- 
alized likelihood ratio test (GLRT) or scan statistic, a computationally intensive procedure that entails scanning all 
well connected clusters and testing individually for anomalous activation. Unfortunately, its performance over general 
graphs is not well understood, and little attention has been paid to determining alternative, computationally tractable, 
procedures. 

In this article we assume that the class of clusters of activation consists of sub-graphs of small cut size. We believe 
this is a natural and realistic assumption which, as we demonstrate below, allows us to explicitly incorporate into the 
detection problem the properties of the graph topology through its spectrum. In particular, we show that the GLRT 
is an integer program with a term in the objective that corresponds to the sparsest cut in a graph, a known NP-hard 
problem. With this in mind, we propose a relaxation of the GLRT, called the spectral scan statistic, which is based 
on the combinatorial Laplacian of the graph and, importantly, is a tractable program. As our main result, we derive 
theoretical guarantees for the performance of the spectral scan statistic, which hold for any graph and are based on 
the spectral measure of the combinatorial Laplacian. For comparison purposes, we derive theoretical guarantees for 
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two simple estimators, the edge thresholding and the x 2 test. We conclude our study by applying the main result to 
balanced binary trees, the lattice, and Kronecker graphs, giving us precise asymptotic results. We find that, modulo 
logarithm terms, the spectral scan statistic has nearly optimal power for balanced binary trees. Simulations for these 
models verify that the spectral scan statistic dominates the simple estimators. 

Contributions. Our contributions are as follows. (1) We define a new class of activation patterns based on the 
notion of small cut size that reflects in a natural way the topological properties of the graph. (2) We analyze the 
corresponding GLR statistics and show that it is indeed related to the problem of finding sparest cuts. We then develop 
a computationally tractable relaxation of the GLR statistic, called the spectral scan statistic and analyze its properties. 
In our main theoretical result, we show show that the performance of the spectral scan statistic depends explicitly 
on the spectral properties of the graph. (3) Using such results we are able to characterize in a very explicit form the 
performance of the spectral scan statistic on a few notable graph topologies and demonstrate its superiority over naive 
detectors, such as the edge thresholding and the \ 2 test. (4) Finally, we have formulated the detection problem under 
more general and realistic scenarios, which involve composite null and alternative hypotheses as opposed to simple 
hypotheses as is customary in the theoretical statistical literature on this subject. 

Related Wor k. Normal means testing in high-dimensions is a well established and fundamenta l problem in statis 



tics (see, e.g ., [Ingster and Suslina 1 200311) . A s ignificant portion of the recent work in this area (lArias-Castro et al. 



|2005, 2008, 2 01 ill . lAddario-Berrv et al.l 1120101 ') has focused on incorporating structural assumptions on the signal, 
as a way to mitigate the effect of high-dimensionality and also because many real-life probl ems can be repres ented as 
instances of the normal means problem with graph-structured signals (see, for an example. Ijacob et aL 1 201010 . These 
contributions have considered the generalized likelihood ratio test of means when the alternative hypothesis takes on 
the form of a combinatorial space. However, the performance of such test has been analyzed only for certain types of 
graphs, and it is unclear to what extent those analyses extend to general graph topologies. Moreover, while much is 
known about the theoretical performance of the GLRT, no mention is made about its computational feasibility. An- 
other line of research relevant to our problem is the optimal fail detection with nuisance pa r ameters and matched sub - 
space detection in the signal pro c essing literature: s ee, e.g . IScharf and Friedlanden I11994ll . iBaygiin and Herol I11995ll . 



Fouladirad and Nikiforov [2005], Fouladirad et al . [2008]. Though our problem can be cast as a special case of the 



more general problem of optimal testing of a linear subspace under nuisance parameters considered in that line of 
work, the focus on a graph-structured signal, as well as the type of analysis based on the interplay between the scan 
statistics and the spectral properties of the graph contained in our work, are novel. 



1.1 Problem Setup 

We now formalize the problem of detecting a change of signal over the vertices of a graph from noisy observations 
in the high-dimensional setting. For a given connected, undirected, possibly weighted graph G = (V 1 E) on \ V\ = n 
nodes, we observe one realization of the random vector 

y = /3 + e, (1) 

where (3 S M. v and e ~ N(0, er 2 I n ), with a 2 known. We will assume that there are two groups of constant activation 
for the signal (3, namely that there exists a subset C C V such that (3 is constant within both C and it complement 
C = V\C. We formalize this assumption by writing 

P = lil + 61 C , (2) 

where /x, S € K are unknown parameters, 1 g MX is a n-dimensional vector of ones and lc is the indicator function 
of the subset C. The parameter /i can be thought of as the magnitude of the background signal and is a nuisance 
parameter, while S quantifies the the gap in signal between the two clusters. Setting /3 = l T /3/n, we will use ||/3 — /3|| 
to measure the energy of the signal (note that this quantity is independent of /x), and we will define the signal-to-noise 
ratio (SNR) to be 

11/3- £11 = [\CW\5 _ 
a V n a 

We will not assume any knowledge of the true clustering (C, C), other than that it belongs to a given class C of bi- 
partitions (C, C) of V such that C and C are both large and can be easily disconnected, in that they have low cut size 
. Formally, we define, for some p > 0, 

C = C(P, = {CCKC^:|^<^}, (3, 
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where dC — € E : i 6 C,j € C} is the boundary of C. Note that C is a symmetric class in the sense that 

C € C if and only if (7 G C. We are interested in the problem of testing whether the gap parameter 5 in equation (O 
is zero (i.e. the signal (3 is constant) or it is non-zero for some C £ C, regardless of the value of p. Thus, we can 
naturally cast our structured change-point detection problem as the following composite hypothesis testing problem: 

Ho:peGo vs Hx-.peQx, (4) 

where O = {^1,^ € K} and0i = {lp+l c S,p £ K, 5 £ R\{0},C € C}. Notice that the alternative can be written 
as the join over C of disjoint composite alternatives of the form ijf : (3 £ Of := {lp + led, p £ K, 8 € K \ {0}}, 
CeC. 

To make our analysis meaningful, we measure the difficulty of the detection problem in terms of the energy 
parameter by assuming that, for some r\ > 0, \\(3 — f3\\ > rj, V/3 £ Si. Thus, we can think of rj as the minimal 
degree of separation between the null and alternative hypotheses. Below we will analyze asymptotic conditions under 
which the hypothesis testing problem described above is feasible, in a sense made precise in the next definition, when 
the size of the graph n increases unboundedly. To this end, we will further assume that the relevant parameters of the 
model, rj, a, 8 and p change with n as well, even though we will not make such dependence explicit in our notation 
for ease of readability. Our results establish conditions for asymptotic disinguishability as a function of the SNR rj/a 
and p and the spectrum of the graph G. 

Definition 1. Let Pg denote the distribution of y induced by the model (Q3, where 9 £ Oo U 0i. For a given statistic 
S(y) and threshold r £ R, let T — T(y) be 1 if S(y) > t and otherwise. We say that the hypotheses Hq and Hi 
are asymptotically distinguished by the test T 

sup F {T = 1} -)• and sup P e {T = 0} ->• 0, (5) 

0EH o 06-f/i 

where the limit is taken as n —> oo. We say that Hq and Hi are asymptotically indistinguishable there does not 
exist any test for which the above limits hold. 



Notation. We will need some mathematical terminology from algebraic graph theory (IGodsil et al. I booih . A 



central object to our analysis is the combinatorial Laplacian matrix L = D — W, where W = (I{(v, w) £ 
E}) v ,w£V is the adjacency matrix of the graph G and D = diag{d„}„ e y is the diagonal matrix of node degrees, 
d v = J^wev W v>w , v € V. If the graph is weighted then W v>w reflects this. We will denote the eigenvalues of L 
with {Ai}™ =1 , which we will always take in increasing order. Since G is connected, the smaller eigenvalue Ai = 0, 
with corresponding eigenvector, 1. A2 is known as the algebraic connectivity and is lower bounded by 4[ndiam(G)] _1 
where diam(G) is the diameter of the graph. Throughout this study we use Bachmann-Landau notation for asymptotic 
statements: if a n /b n —> then a n — o(b n ) and b n — u>(a n ). 



2 Methods 



The hypothesis testing problem at hand presents two challenges: (1) the model contains an unbounded nuisance 
parameter p, £ M. and (2) the alternative hypothesis is comprised of a finite disjoint union of composite hypotheses 
indexed by C. Thes e features set our problem apart from virtually all existing work o f structured normal means 



problems (see, e.g. lArias-Castro et al. [2005] l2008[ l201lll . lAddario-Berrv et al l ll2010ln . which does not consider 



nuisance parameters and relies on a simplified framework consisting of a simple null hypothesis and a composite 
hypothesis consisting of disjoint unions of simple alternatives. Having nuisance parameters and composite hypothesis 
require a more sophisticated analysis. 

We will eliminate the interference caused by the nuisance parameter by considering test procedures that are inde- 
pendent o f p. The formal justification for this choice is based on the theory of optimal invariant hy pothesis testing 
(see, e.g. jLehmann and Romano! 1 2005 1) and of uniformly best co nstant power test s (see[Waldl lll943|). Due to space 
hmitat i ons we will not provid e the d e tails and refer t h e reader to Fouladirad et al. [2008], Fouladirad and Nik iforov 



|2005], Fillatre and Nikiforov ]2007], Fillatre [2012], Scharf and Friedlander [1994], Baygun and Hero [1995] and 



references therein for in depth-treatments of these issues related to the model a hand. 
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For the simpler problem of testing Hq versus iJp for some C C V, the optimal test is based on the likelihood ratio 
(LR) statistic (see the proof of Lemma |2]below for a derivation) 



/ sup egei fe(y)\ _ 1 \V\ 



\snp eee J e (y) J <r 2 \C\\C\ 



21ogA c (y) = log ( ZZ^SL^LL ) = ^-^-L- [VyJ , (6) 



vec 



where y = y — y = (y v ,v € V) and f$ is the Lebesgue density of Pg. This test rejects Ho for large values of Ac(y). 
Optimality follows from the fact that the statistical model we consider has the monotone likelihood ratio property. 

When testing against composite alternatives, like in our case, it is customary to consider instead the generalized 
likelihood ratio (GLR) statistic, which in our case reduces to 

g = max 2er 2 log A c (y). 

cec( P ) 

Through manipulations of the likelihoods, we find that the GLR statistic has a very convenient form which is tied to 
the spectral properties of the graph G via its Laplacian. 

Lemma 2. Let y = y - l(££„ eV y„) and K = I - ±H T . Then 

x T yy T x x T Lx 

■9 = ™ ax , — Tx7 — sX T xi- - P> (') 
x&{o,i} n x 1 Kx x 1 Kx 

where L is the combinatorial Laplacian of the graph G. 

The proof is provided in the appendix. The savvy reader will notice the connection between (0 and the graph 
sparsest cut program. By Lagrangian duality, we see that the program (0 is equivalent to (for some Lagrangian 
parameter v) 

. \dC\ (EiecVif 

mm = v 

ccv|C||C| \C\\G\ 

the first term of which is precisely the sparsest cut objective, and the second term drives the solution C to have 
positive within cluster empirical correlations. The sparsest cut program is known to be NP-hard, with poly-time 



)gram is k 

algorithms known for trees and planar graphs dMatula and Shahrokh i [1990]). Because of this fact, approximate al 



go rithms have been proposed over the past t wo decades, most notably the uniform multicommodity flow approac h 
of jLe ighton and Raol lll988ll . lShmoysl <1997ll ) and the semi-definite relaxation of the cut metric (lArora et al. [2009]). 



Hagen and Kahng 1 1992] observed that the minimum cut sparsity is bounded by the algebraic connectivity (A2), sug 
gesting the Fiedler vector (i.e. the second eignenvector of L) to be an appropriate relaxation of the characteristic vector 
of the cut. Moreover, the well known Che eger inequality shows that the minimum cut sparsity (in a regular graph) is 
bounded by the algebraic connectivity (see Chund 1200411 ). We will follow the tradition of bounding sparsity with the 



algebraic connectivity, and provide a surrogate estimator to the scan statistic based on this simple spectral relaxation. 
Proposition 3. Define the Spectral Scan Statistic (SSS) as 

s = sup (x T y) 2 s.t. x T Lx < p, ||x|| < l,x T l = 0. 

Then the GLR statistic is bounded by the SSS: g < s. 

Proof. First let us notice that K = I — ^H T is the projection onto the subspace orthogonal to 1. Because K is thus 
idempotent, yl = 0, and LI = we can rewrite 

(Kx) T yy T (Kx) (Kx) T L(Kx) 
■ 9 x e{ o,i}»\{o,i} (Kx) T (Kx) S (Kx) T (Kx) ~ P 

So, we have the following relaxation, 

x T yy T x x T Lx 
g < max == s.t. — == — < p = s 

x#0,x T l=0 X 1 X X 1 X 

□ 
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Remark 4. By Lagrangian duality and the Courant-Fischer theorem, the spectral scan statistic can be written as 

s = minx(yy T - f A) + up 

where x(A) is the maximum non-zero eigenvalue of the matrix A. 

Notice that because the domain X = {x G M" : x T Lx < p, ||x|| < 1, x T l = 0} is symmetric around the origin, 
this is precisely the square of the solution to 

V§ = sup x T y s.t. x T Lx < p, ||x|| < 1, x T l = 0, (8) 

x6K™ 

where we have used the fact that x T y = ((I — ill T )x) T y = x T y because x T l = within X. This previous 
formulation shows that the SSS is related to the supremum of a Gaussian process over X. This fact will turn out to be 
extremely convenient, as we show next. 



3 Theoretical Analysis 

We first derive a simple condition for asymptotic indistinguishability based on testing the null versus a single compo- 
nent in the alternative. A more refined analysis of the lower bound for the general hypothesis <j4j> is beyond the scope 
of this article. 

Theorem 5. Suppose that there exists C G C such that j^j >; 1. Then Hq and Hi are asymptotically indistinguishable 
ifr)/<7 = 

The proof is in the appendix. We will analyze the performance of the SSS statistic by relying on its representation 
<[8J as the s quare of the supr emum of a Gaussian process. We draw heavily on the theory of the generic chaining, 



perfected in iTala gra nd [2005], which essentially reduces the problem of computing bounds on the expected supremum 
of Gaussian processes to geometric properties of its index space. Recall that, under alternative hypothesis, ||/3— /3|| > r\ 
uniformly over 9i. 

Theorem 6. The following hold with probability at least 1 — 6. Under the null Hq 

2 



, j i /2a2y^ m m{l,pAr 1 } + A /2 ( T 2 log| 



j 1 1 1 1 1 ; i . it) ! 1 

i>l 



while the alternative Hi 



s > j l /2cr 2 log^ 



Proof. We use generic chaining to control the process {x T y} x ex appearing in the SSS. First, we notice that the index 
set X is the intersection of an ellipsoid and the unit ball, which is the intuition behind the following lemma. 

Lemma 7. Let L have spectrum {Ai}" =1 . Then under Hq, 




2cr 2 ^min{l, / 9Ar 1 }. 



i>l 

The proof is provided in the appendix. We then can use the well known phenomena, that the supremum of a 
Gaussian process concentrates around it's expectation (see the appendix). Hence, by Lemma[l4]the first statement in 
Theorem[6]holds. The second statement follows by applying standard concentration results to the univariate Gaussian 

prf-^y and noticing that € X and K^f^y = ||/3 - /3|| > r) under H t . □ 

As a corollary we will provide sufficient conditions for asymptotic distinguishability that depend on the spectrum 
of the Laplacian L. As we will show in the next section, these conditions can be applied to a number of graph 
topologies whose spectral properties are known. 
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Corollary 8. The null and alternative, as described in Thm. [6] are asymptotically distinguished by s and <?„(y) if 



V 

— = bJ 

a 



/^minapAr 1 } (9) 



Other stronger sufficient conditions are 



if k is large enough that Xk+i > p. 

Proof. To see equation (|9]l we note that, due to Theorem[6] if 



1 2a 2 min {!' PK 1 } + \l 2(j2 lo S \ = ° ( V - \] 2 ^ 2 log | 



then we attain asymptotic distinguishability by choosing any threshold r between, and sufficiently far from, the left 
and right hand side of the previous display. To show equation ( fTOb we note that by choosing k such that Afc + i > p we 
see that 

^2 wjx{1, pXr 1 } < ft ^ min{l, pA^ 1 } < ( n _ fc)^— . 

Interestingly, there are no logarithmic terms in (|9} that usually accompany uniform bounds of this type, which is 
attributed to the generic chaining. Notice that the left hand side of (0 is always less than y/n — 1, which we will see 
characterizes the performance of the naive estimator ||y||. 

For comparison, we consider the performance of two naive procedure for detection: the energy detector, which 
reject Ho if ||y|| 2 is too large and the edge thresholding detector, which reject Hq if maxr ViW \^E \Yv ~ Yw\ is large. 

Theorem 9. Hq and Hi are asymptotically distinguished by ||y|| if and only if 

T) 



— = n— T). 
a 



The proof (given in the appendix) is a standard x 2 analysis. In ISharpnack et alj 1120 1 211 the authors examined 
the problem of exact recovery of cluster boundaries in the graph-structured normal means problem by taking dif- 
ferences between obse rvations corresponding to adjacent nodes. The following result stems from Theorem 2.1 of 



Sharpnack et al.l B2012I1 . and the fact that |C||C|/n scales like min{|C|, |C|} up to a factor of 2. 



Theorem 10. Hq and Hi are asymptotically distinguished by maxr VtW \£E \y v — y w \ if 

— = ui I / max \C\ \osn ) . 

(J \ Y CeC, \C\<n/2 J 

— \c\ 

If C contains balanced clusters, i.e. bipartitions (C, C) such that x 1, then this result matches the scaling in 
Theorem[9]up to a log factor. 



4 Specific Graph Models 

In this section we demonstrate the power and flexibility of Theorem|6]by analyzing in detail the performance of the 
spectral scan statistic over three important graph topo l ogies: balanced binary trees, the s-dimensional lattice and the 
Kronecker graphs (see lLeskovec and Faloutsosl ll2007ll . lLeskovec et al.l ll2010l0 . 



6 



4.1 Balanced Binary Trees 



We begin the analysis of the spectral scan statistic by applying it to the balanced binary tree (BBT) of depth I. The 
class of signals that we will consider have clusters of constant signal which are subtrees of size at least cn a for 
< c < 1/2, < a < 1. Hence, the cut size of the signals are 1 and p = [cn a (l — c?i" -1 )] -1 . 

Corollary 11. For the balanced binary tree with n vertices, the spectral scan statistic can asymptotically distinguish 
H^from signals with p = n[cn a (n — cn a )]~ l if the SNR is stronger than 

— = ui(n 2 log n). 
(j 

We simulate the probability of correct discovery of change-points (rejecting Hq when the truth is Hi) versus the 
probability of false alarm (falsely rejecting Ho). These are given for the four estimators in Figure Q~]and for the SSS as 
n = 2 e+1 — 1 increases. In these simulations a subtree at level 2 (of size n/A) was chosen as C, the gap-to-noise ratio 
is fixed at 5/a = 0.8, and p = A/n. We see that even in the low n regime, exploiting the graph structure is essential to 
improve the power of testing Hq against H\. As n increases with 5/a fixed the performance of the SSS dramatically 
increases. 



sy 




fj y 


— SSS 

- - edge thresh 
■ ■ ■ energy 

unconst. GLRT 











— n=63 
-- n=127 
■i- n=255 ~ 
n=511 










— n=6 
-- n=36 
■■■ n=216 
n=1296 



Figure 1: Above: the simulated probability of correct discovery (power) against false alarm (size) of the SSS compared to the 
energy detector, edge thresholding and the unconstrained GLRT of the BBT (left), Lattice (middle), and Kronecker graph (right). 
Below: the performance as n increases. 



4.2 Lattice 

We will analyze the performance guarantees of the SSS over the 2-dimensional lattice graph with p vertices along 
each dimension (n = p 2 ). We will assume that p = CnT 1 / 2 , as this is the cut sparsity of re ctangles that have a 
low surface area to volume ratio. By a simple Fourier analysis (see ISharpnack and Singhl J2010l0 . we know that the 
Laplacian eigenvalues are 2(2 — cos(27rii/p) — cos(27ri2/p)) for all ix,t2 € [p\- We will appeal to (TTOb . Because 

1 — cos(27rii/p) s» {2mi/p) 2 for i\ « p, if we rewrite i = (11,12) for ii, i 2 £ [p] then A( il . i2 ) ~ + *!)• 

Hence, 

71s ft Ti 

k « \{(h,i 2 ) :il+il< ^A fe+1 }| < \{i x : i\ < ^A fc+1 }| 2 = [^A fe+1 l 

Then by choosing A^+i x ^fp the term in the root of the LHS of (TTOt is bounded by, A^+i] + ^ x n^fp x rt 3 / 4 
modulo lower order terms. We arrive at the following conclusion, 
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Corollary 12. For the p x p square lattice, the spectral scan statistic can asymptotically distinguish Hq from signals 
with cut size Cn^ 1 / 2 if the SNR is stronger than, 

2 = w (n 3 / 8 ) 

(7 

We demonstrate the improvement of the SSS over competing tests in Figure[T] In these simulations a ^/n/2x y/n/2 
square was chosen to be C with p = 4/y/n. Despite the weaker guarantee in Corollary [T2]the SSS demonstrates the 
importance of exploiting the graph structure. 



4.3 Kronecker Graphs 

Much of the research in complex networks has focused on observing statistical phenomena that is common acro s s man y 



data sources. The most notable of these are that the degr ee distributio n obeys a power law (IFaloutsos et al. [1999]) 



and networks are often found to have small diameter (IMilgram [1967]). A class of graphs that sat i sfy these, while 



provid ing a simple modelling platform are the Kronecker graphs (see lLeskovec and Faloutso s [2007], Les kovec et al. 



|2010]). Let Hi and H2 be graphs on p vertices with Laplacians Li,Ii2 and edge sets E\,E% respectively. The 



Kronecker product, Hi ® H2, is the graph over vertices [p] x [p] such that there is an edge ((ii, 12), (ji, 32)) if ii = ji 
and (12, J2) G E2 or 12 = ]2 and (ii, ji) € E\. We will construct graphs that have a multi-scale topology using the 
Kronecker product. Let the multiplication of a graph by a scalar indicate that we multiply each edge weight by that 
scalar. First let H be a connected graph with p vertices. Then the graph G for I > levels is defined as 

1 1 1 

H ® -^r^H <x> ... <g> -H®H 



pi 1 pi 2 p 

The choice of multipliers ensures that it is easier to make cuts at the more coarse scale. Notice that all of the previous 
results have held for weighted graphs. 

Corollary 13. For G be the Kronecker product graph described above with n = p l vertices, the spectral scan statistic 
can asymptotically distinguish Hq from signals with cuts within the k coarsest scale (p oc p 2k ~ i ~ 1 ), if the SNR is 
stronger than, 

"l=u{p 2 (l + 2)n( 2k+1 V l ) 
a 

The proof and an explanation of p is in the appendix. Again, we demonstrate the improvement of the SSS over 
competing tests in Figure Q] For these simulations the base graph H was chosen to be two triangles (^3) connected 
by a single edge (p = 6). At the coarsest scale one of the K3 subgraphs was chosen to be C with p = 4/n. 



5 Discussion 

We studied the heretofore unaddressed problem of how to tractably detect change-points in networks under Gaussian 
noise. To this end we developed the spectral scan statistic, suggesting it as a computationally feasible alternative to 
the GLRT We completely characterized the performance of the SSS for any graph in terms of the spectrum of the 
combinatorial Laplacian. For comparison purposes, we developed theoretical guarantees for two simple estimators. 
We applied the main result to three graph models: binary balanced trees, the lattice and Kronecker graph. We see that 
not only is it statistically inadmissible to ignore graph structure, but for the balanced tree the SSS gives near optimal 
performance. This claim is backed by both simulation and theory. 
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6 Appendix 



6.1 Proofs in Section 2 

Proof of Lemma\2\ To expedite the proof, we express the LR statistics in terms of the sufficient statistics yo = 

jUjEiecVi ~ N(/3 ,a 2 ) and yi = ^Eiec* ~ N(Pu°i) for °° = */VW\ and a x = a/^\C\. Then, 
we obtain 

2io g A c (y) = A(yo - /3) 2 + A(yi - /^) 2 

CT 2 CT 2 

where /3 = ^ * 3 yo H — rf-T-yi is the MLE under Hq. (The likelihood under the alternative balances with the 
normalizing constant of the null likelihood.) Thus, 

1 /" o 2 \^ \ f 2 ^ ^ 

21ogA c (y) = -J ( _ 2 g ° _ 2 (y - yi) ) + -j ( _ 2 gl _ 2 (yo - yi) 



(i -r "| / ^1 \ "o t u :i 

2 



(yo-yi) 2 _ i \C\\C\ 

al + al a 2 \V\ 



(yo - yi) z 



2 



J_jv]_ \c\ T _Mr y 
\c\\c\ [\v\ ^ \V\^ 



1 \v\ 



° 2 \c\\c\ 

Now we let x = lc, making the statistic above 



(S y "-M„?/^™(P")' 



o 2, » / "\ * T yy* . \dc\\v\ 

2a log A c (y) = TT , and 



x T Kx |C||C| x T Kx' 
The result now follows by considering all the indicator functions corresponding to the sets in C. □ 



6.2 Proofs in Section 3 

Proof of Theorem\5\ Let the true C £ C be known. The performance of the optimal test with C known, which by 
the Neyman-Pearson Lemma is based on 2 log Ac(y), bounds the performance of that with C unknown. To this end, 
note that, under Hq, the LR statistic © has a x1> while under the alternative H± it has a xf(A) distribution with 
non-centrality parameter 

s 2 \c\\c\ = v 2 

a 2 \V\ (t 2 ! 

which is the square of the SNR. For fixed C, asymptotically indistinguishable of Hq versus H^ follows by considering 
any threshold and noticing that the associated type 1 and type 2 errors are non-vanishing under the SNR scaling 
assumed in the statement. Since the risk of testing Hq versus Hi is no smaller than the risk of testing Hq versus H\,, 
the result follows. □ 

We remark that the proof of the previous result shows that when distinguishing Hq from H±, the power of the test 
is maximal when \C\ = \C\ for a fixed value of the SNR. 

Proof of Lemma\7\ Without loss of generality, let y ~ A/"(0, 1). We recall that, since G is connected, the combinatorial 
Laplacian L is symmetric, its smallest eigenvalue is zero and the remaining eigenvalues are positive. By the spectral 
theorem, we can write L = UAU T , where A is a (n — 1) X (n — 1) diagonal matrix containing the positive eigenvalues 
of L in increasing order and the columns of the n x (n — 1) matrix U are the associated eigenvectors. Then, since 
each vector x 6 W 1 with l T x = can be written as Uz for a unique vector z £ M™" 1 , we have 

X = {x G W 1 : x T Lx < p, x T x = 1, l T x < 0} 

= {Uz e K" : z e K"- 1 ,z T U T LUz < p,z T U T Uz < 1} 
= {Uz el":ze M n_1 , ±z T Az < l,z T z < 1}, 
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where in the third identity we have used the fact that U T U = I n _i. Letting Z = {z £ 



pn-l . 1 „T 



Z T Az < 1,Z T Z < 1}, 



we see that 



T T " /■ 

sup x y = sup z U y = sup z §, 

xeA 1 ZS.Z z£Z 



where £ ~ A^(0, I n -i) and = denotes equality in distribution. 

Next, we show that the set Z, which is the intersection of an ellipsoid with the unit ball in R" _1 , is contained in 
an enlarged ellipsoid. The supremum of the Gaussian process z T £ over Z will then be bounded by the supremum 
of the same proc ess over this larger but simpler set, which we will be able to bound using directly a result from 



Tala grand! 1200511 based on chaining. To this end, let A = = diag{eit}]Li and d = max-jj : a,j < 1}. For for 



a vector z £ R" set zi = zr d i, z 2 = zr n _i]\[di, and A 2 = dmg{ai}i >c i- Then, we observe the following chain of 
implications, holding for vectors z £ 



pn— 1. 



Z?<1 



||z|| < l,z T Az< 1^ || Zl || < l,£)a, . 

i>d 

z7 Zl + zjA 2 z 2 < 2 =► }J ^-^z- < 1. 



Hence, we have the bound 



Recalling that a. 



A i4 



- , for i = l 



E\/I<E sup z T £s.t. V2max{l,ai}xf < 1. 

, n — 1, where Aj+i is the (i + l)th eigenvalue of L, by Proposition 2.2.1 in 



Talagrandl 1200511 the right hand side of the previous expression is bounded by \/2j2 i>1 min{l, p\ i } 



□ 



Supplement to the proof of Theorem® The following property of G aussian proces ses e ffectively redu c es th e study of 
their supre mum to the stud y of its expectation. It was established by Borell 1 1975] and Cirelson et al.l II 197611 and can 
be found in ILedouxl 11200 ill . 

Lemma 14. Consider a Gaussian process {Zt}teu where U is compact with respect to metric 

d{s,t) = (E(Z s -Z t ) 2 ) 1/2 , s,t,£U, 
and let a 2 > sup tgW EZ 2 . We have that with probability at least 1 — S 



supZ t 
teU 



E sup Z, 

teu 



<\l2aHo g ~. 



Notice that the natural distance is given by c?(x ,xi) = (E((x — Xi) T y) 2 ) 1 / 2 = cr||xo — Xi|| forx ,Xi £ X. □ 



Proof of Theorem® Recall that y = Ky, where K = I„ — ^H T is the orthogonal projection matrix into the (n— 1)- 
dimensional linear subspace of vectors orthogonal to 1. Under H n , y ~ N(0, cr 2 K), and, therefore, ||y|| 2 ~ Xn-i> 
since tr(K) = n — 1. On the other hand, under H± for a fixed C, y ~ N(Kf3, cr 2 K), where (3 is given in as in (|2). 
Thus, under ijf, ||y|| 2 ~ Xn-i(A), where the non-centrality parameter is given by 

A = /3 T K/3 = i/3 T K T K/3=||/3-/3|| 2 >^, (12) 

where the second identity is due to the fact that K is symmetric and idempotent and the last inequality to our as- 
sumption on the minimal separation r\ between H and any of the alternatives. Thus, if r]/a = u>(y/n — 1), then 
A = u j(n — 1). Hence, using standard chi-square tail bounds (see for example proposition 2 of Azizyan and Singh 
1 2012|]) and since the bound (TT21) holds uniformly over all C £ C, it follows that the null and alternate are asymptoti- 
cally distinguishable using the test statistic ||y|| if and only if 2 = uj(y/n — 1). □ 
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6.3 Proof in Section 4 



Proof of CorollarvUT] The study of the spectra of trees really began in earnest with the work of FiedleJ 1 1 975 1 . No- 



tably, i t became apparent that tree have eigenvalues with high multiplicities, particularly the eigenvalue 1. Mo litierno et al 



[2000] gave a tight bound on the algebraic connectivity of balanced binary trees (BBT). They found that for a BBT of 
depth £, the reciprocal of the smallest eigenvalue (A 2 ) is 

4, < * - 21 + 2 - g'-V^ -l-*- 1 ) + (3 _ 2V - 2cos{ * )r 

\W - 2* - 1 - V2(2*-i _ !) ^ y 2£-l" (13) 

<2 e + 10M{e < 4} 



Roiol 1200211 gave a more exact characterization of the spectrum of a balanced binary tree, providing a decomposition 



of the Laplacian's characteristic polynomial. Specifically, the characteristic polynomial of L is given by 

det(AI-L) = pt\\)pt 3 (\)...pt 3 (\)pt 2 (\) P i-i(\)stW (14) 

where sg(X) is a polynomial of degree I and Pi(X) are po lynomials of degree i with the smallest root satisfying the 
bound in dl~3~b with i replaced with i. In lRoio and Soto [2005], they extended this work to more general balanced trees. 

By (|T4]> we know that at most I + (£ - 1) + (I - 2)2 + ... + (£ - j)2 3 ' -1 < £2 3 eigenvalues have reciprocals 
larger than 2^ + 105/{j < 4}. Let k = max{ f^ 1 "")] , 2 3 }, then we have ensured that at most k eigenvalues are 
smaller than p. For n large enough 

e 

^min-tTpA^ 1 } < k + p i2 j 2 e - j = k + £(l - logk)np = 0{n 1 ~ a (\ogn) 2 ) 

i>l j>logk 

□ 

Proof of Corollary\T3\ The Kronecker product of two matrices A,B e M" xn is defined as A ® B e R(«xn)x( n xn) 
such that (A ® 'B)(i 1 ,i a ),(j 1 ,j 3 ) — Ai 1 ,j 1 B i2 .j 2 . Some matrix algebra shows that if Hi and H 2 are graphs on p vertices 
with Laplacian s Li , L2 then the Laplacian of their Kronecker product, Hi (g) H 2 , is given by L = Li ® I p + I p <S> 1*2 
(Merris [1998]). Hence, if vi,V2 € M. p are eigenvectors, viz. Livi = AiVi and L2V2 = A2V2, then L(vi ® V2) = 



(Ai + A2)vi ® V2, where vi (g) V2 is the usual tensor product. This completely characterizes the spectrum of Kronecker 
products of graphs. 

We should argue the choice of p oc p 2k - e ~ 1 7 by showing that it is the results of cuts at level k. We say that an edge 
e = ((ii, ig), (ji, has scale k if ife ^ jfe. Furthermore, a cut has scale k if each of its constituent edges has 

scale at least k. Each edge at scale k has weight p k ~ e and there are p 1 ^ 1 such edges, so cuts at scale k have total edge 
weight bounded by 

t— ' p — 1 p — 1 

Cuts at scale k leave components of size p l ~ k intact, meaning that p oc p 2fc -^-! for large enough p. 

We now control the spectrum of the Kronecker graph. Let the eigenvalues of the base graph H be {^j}^ =1 in 
increasing order. The eigenvalues of G are precisely the sums 

1 1 1 

1 = 7^ Vti + p~^" 12 + - + p Vtt - 1 + Uit 

for i = [ij )j—i C [p]. The eigenvalue distribution {A^} stochastically bounds 



where = min{j : Vi e _, 7^ 0}. Notice that if i is chosen uniformly at random then Z(i) has a geometric 
distribution with probability of success (p - l)/p. Also p/(-%fa) = p z ( i )+ 2k - t - 1 j V2 > 1 if > I + 1 - 2k + 
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log p v 2 , so 



x p 2k-e-i LW^j^., (£ + 2)p 2fc -^ 1 
-r > mm l,-}< h > F < 

P € .tr(, A 4 v 2 f-^ v 2 p z p v 2 

This followed from the geometric probability mass function. We also know that the algebraic connectivity, v 2 , 
bounded from below by 4p~ 2 , so the following result holds. 
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